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Abstract 

A new field of research is rapidly expanding at the crossroad between statistical physics, infor- 
mation theory and combinatorial optimization. In particular, the use of cutting edge statistical 
physics concepts and methods allow one to solve very large constraint satisfaction problems like 
random satisfiability, coloring, or error correction. 

Several aspects of these developments should be relevant for the understanding of functional 
complexity in neural networks. On the one hand the message passing procedures which are used 
in these new algorithms are based on local exchange of information, and succeed in solving some 
of the hardest computational problems. On the other hand some crucial inference problems in 
neurobiology, like those generated in multi-electrode recordings, naturally translate into hard 
constraint satisfaction problems. 

This paper gives a non-technical introduction to this field, emphasizing the main ideas at 
work in message passing strategies and their possible relevance to neural networks modelling. 
It also introduces a new message passing algorithm for inferring interactions between variables 
from correlation data, which could be useful in the analysis of multi-electrode recording data. 



1. Introduction: Constraint Satisfaction Problems 

Engineers offer encounter problems with many degrees of freedom ('variables') but 
also many constraints. The problem is to find a value of the variables which satisfies 
all constraints, or the most probable configuration of variable given the constraints and 
some a priori measure. Obvious applications are scheduling (classes, airplanes...), or job 
assignment. But similar problems occur in various branches of scientific activity, and 
are crucial in several domains. To be short we shall focus here on four of them. The 
satisfiability problem is at the core of the theory of computational complexity in computer 
science. Error correcting codes are one of the main topic of information theory. Learning 
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from examples is a basic process in cognitive neuroscience. Reconstruction of neuron 
interactions from multi-electrode recording is a problem which is becoming more and 
more important. 

All these problems can be formulated in a common language ( Mezard &: MontanariL 



20081 ). and have a strong relationship to fundamental issues in statistical physics like the 



existence of phase transition, and the possibility of glassy phases. They can also be cast 
into a some what generic formalism, b ased a graphical representation of the topology of 



constraints (|Kschischang et all 120011 ). which allows to apply a general 'message passing' 



strategy to all of them. Some of these message passing algorithms have actually shown 
strikingly good performance, solving some problems in satisfiability or perceptron learn- 
ing that are unreachable by any other algorithms. It is interesting in itself to understand 
how fundamental issues in computational complexity and information processing can be 
formulated in the same language as relevant problems in neuroscience, the main aim of 
this paper is to give some clues on these connexions. 



2. Satisfiability 

The problem of satisfiability involves N Boolean variables Xi E {T, F}. There exist 
thus 2 N possible configurations of these variables. The constraints take the special form 
of 'clauses', which are logical 'OR' functions of the variables. For instance the clause 
x\ V X2 V X3 is satisfied whenever x\ — T or X2 — T or X3 — F (the bar means negation: 
T = F and F = T). Therefore, among the 8 possible configurations of Xi,%2,%3, the 
only one which is forbidden by this clause is x\ = X21 = 0, 2:3 = 1. An instance of the 
satisfiability problem is given by the list of all the clauses it contains. The problem is to 
find a choice of the Boolean variables (called an 'assignment') such that all constraints 
are satisfied. When there exists such a choice the corresponding instance is said to be 
'SAT', otherwise it is 'UNSAT', and one typically seeks a configuration of variables which 
violates the smallest number of constraints. 

Satisfiability plays an essential role in the theory of computational complexity, be- 
cause many other difficult problems like the traveling salesman, the colouring of graphs, 
scheduling, protein folding, can be mapped ' polynomially ' to it. It was the first problem 



which has been shown to be 'NP-complete' (|Cookl . 119711 ). This means that if one could 



find an algorithm that solves satisfiability in a 'polynomial' time (growing like a power 
of N), one could also solve all these other problems in polynomial time: life would be 
much easier, in particular the life of scientists... This is generally considered unlikely, but 
the corresponding mathematical problem (whether the NP class is distinct or not from 
the 'P' class of problems which are solvable in polynomial time) is an important open 
problem in mathematics. 

The result of Cook is a worst case analysis of the satisfiability problem. However it 
appears more and more important to study 'typical case' complexity of satisfiability 
problems by introducing some classes of instances. A much studied class is the ran- 
dom '3-SAT' problem. Each clause contains exactly three variables chosen randomly in 
{xi, .., Xn}, and each variable is negated randomly with probability 1/2. This problem 
is particularly interesting because its difficulty can be tuned by varying one single con- 
trol parameter, the ratio a = M- of constraints per variable. One expects intuitively 
that for small a most instances are SAT, while for large a most of them are UNSAT. 



2 



Pi 



N 



N 




Fig. 1. Left: Probability that a formula generated from the random 3-SAT ensemble is satisfied, plotted 
versus the clause density a. The curves correspond to N = 50 (full line), N = 100 (dashed), N = 200 
(dotted). The transition between satisfiable and unsatisfiable formulas becomes sharper as N increases. 
Right: Computational effort. Plotted is the computer time (in arbitrary units) required to find a solution, 
or prove that there is no solution, versus the clause density a. From bottom to top: N = 50, 100, 150, 
200. 

Numerical experiments have confirmed this scenario, but they indicate actually a more 
interesting behavior. The probability that an instance is SAT exhibits a sharp crossover, 
from a value close to 1 to a value close to 0, at a threshold a c which is around 4.3. 
When the number of variable s N increases, the crossover b ecomes sharper and sharper 
( Kirkpatrick fc Selmanl . 1994 : Selman fc Kirkpatrick . 19961 ), as shown in F ig.!. It has 
been shown that it becomes a staircase behavior at large N (jFriedgu tL ll999h : almost all 
instances are SAT for a < a c , almost all instances are UNSAT for a > a c . This threshold 
behavior is nothing but a phase transition as one finds in physics , and has been analyzed 



Mezard et all . l2003[ ) 



using the methods of stat istical physics (jKirkpatrick fc Selmanl . Il994t iMonasson et al 
199' 



A very interesting observation illustrated in Fig.l is that the algorithmic difficulty of 
the problem, measured by the time taken by the algorithm to answer if a typical instance 
is satisfiable, also depends strongly on a: the problem is easy when a is well below or 
well above a c , and is much harder when a is close to a c . Therefore the region of phase 
transition is also the region which is difficult from the computational point of view. 



3. Error correction 



One of the fundamental problems in information theory consists in correcting transmis- 
sion errors that always occur when a message is sent throug h a communication channel 
( Richardson fc Urb ankc , 2006: iMontanari fc UrbankeL[2007h . This is done by adding re- 
dundancy. In codes based on parity constraints, the message which is sent is chosen in a 
pool of 'codewords'. A codeword is a set of N bits x\, ■ ■ ■ ,xjv, where X{ £ {0, 1}, which 
satisfies M parity check equations taking the form: 

Eii(a) H 1- Xi K ( a ) = even (1) 

For each a £ {1, • • • , M} there is one such equation, characterized by the set of bits 
ii{a), • ■ ■ which are involved in it. So the codebook, i.e. the set of codewords, is 

the set of solutions to these M constraints. It is conveniently represented graphically as 
in Fig. 2. Because the code is based on a system of linear equations, if they are designed 
to be independent, which is usually the case, the number of codewords will be 2 : 
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Fig. 2. Tanner graph representation of a parity check code. Here there are 7 bits related by three parity 
check equations. Each square represents a parity check: it enforces the constraints that the sum of bits 
connected to it must be even 



the code transmits N — M effective bits of information, the extra M bits are used to 
introduce redundancy and possibly correct errors. 

How does one correct errors? Imagine for simplicity that a codeword x = Xx, ■ • • ,xn 
is sent through a 'binary symmetric channel', which flips each bit independently with 
probability p < 1/2. The received message is y = y±, ■ ■ ■ , j/at, where yi = X4 with prob- 
ability 1 — p, and yi = 1 — Xi with probability p. Decoding means trying to infer the 
sent codeword x given the received one y. For this we write the probability that the sent 
message was a set of bits x' = x'x, ■ • ■ , x' N : 

1 M 
PU'k) = % II [(! - P) 5 *'^ +P S '' i ,i-v] II 1 (<(«) + ' ' ' + = even ) ( 2 ) 

i a—1 

where the first terms come from our knowledge of the channel, and the last ones enforces 
the fact that the sent codeword is known to satisfy the parity check equations (1(A) is 
an indicator function equal to one if the statement A is true, equal to if it is not true). 
Decoding amounts to finding the most probable codeword given the received message, 
i.e. finding the set of bits x' which maximizes P(x'\y). This is in general another difficult, 
NP-complete, problem. But we will see that it can done efficiently with a message passing 
procedure called Belief Propagation (BP) if the noise level p is not too large. 

Low Density Parity Check (LDPC) codes are based on random cons tructions in which 
the parity check equations are generated randomly (jGallagerl . 1963 ). For instance in 



regular (I, k) codes one generates equations such that each equation contains k variables, 
and each variable appears in / equations. In the large code limit N — > 00 one finds 
two phase transitions when one varies p. The first one is the threshold for decoding 
through BP: it works almost always when p < pa, it fails almost always if p > pd- The 
second one is the threshold for decoding through exact inference (computing the true 
maximum of P(x'\y)). It works almost always when p < p Cl it fails almost always if 
p > p c . For instance in a (I = 3, k = 6) regular LDPC codes, the two thresholds are 
Pd = 0.084 and p c = 0.101, while Shannon's theorem states that perfect decoding should 
be possible up to p — 0.110, and impossible above. In practice the relevant threshold 
is pd- This is because BP decoding is fast (it typically takes a time that grows linearly 
with N), while exact inference is much too slow (its time grows exponentially with N). 
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Optim ized LDPC codes can have a threshold Pd which gets quit e close to the Shannon 
limit ( Richardson fc Urbanke . 20061 : Montanari fe Urbankd . l2007h . 



4. Two problems in neuroscience 



4.1. Supervised learning 

Learning and memory tasks are believed to occur in neural systems through changes 
of synaptic strengths. Despite years of efforts, the precise way these changes are imple- 
mented in the brain for specific tasks is poorly understood. In the scenario of supervised 
learning, synaptic changes are monitored by a feedback signal carrying information about 
the success of the intended task. The perceptron classification problem is the prototypical 
example of supervised learning: given a set of training patterns . . . , £ M ), where each 
£ a is a vector of N binary variables (£f = ±1, i = 1, . . . , N), we want to learn the correct 
synaptic weights Wi leading to the classification of these inputs into two classes, C+ and 
C_ , using a feed- forward network called perceptron: 



for each a = 1, 



sign 



(3) 



where we require that a a = +1 if £ a belongs to class C+, and a a = — 1 if £ a belongs to 
class C-. 

Interestingly, this problem can be formulated as a constraint satisfaction problem, 
whose graph representation is given by the right panel of Fig. 3. The weights Wi are the 
unknown variables, and each pattern defines a constraint through Eq. (3). 

Efficient algorithms for solving thi s problem are known in the case of analog synaptic 
stengths (real Wi) (jRosenblattl . 1 19621 ) . However, recent experimental studies have shown 
that some synapses undergo changes in the form of jum ps between a finite number of sta- 
ble states (jPetersen et all Il998t lO'Connor et al. , 2005). Unfortunately, this discreteness 
makes the classification problem much harder: for inst ance, the task of learning binary 
weights Wi = ±1 is NP-complete ( Blum fc Rivestl . ll992l ). Although it has been known for 
years that a perceptron with binary synapses can in principle be trai ned to classify up to 
M = a c N random patterns in the limit of large N, with a c w 0.83 ( Krauth fc Mezardl 
Il989h . until recently no algorithm was known that could even perform this task for an 
extensive number of patterns (i.e. M = aN with N — ► oo and a fixed), emphasizing the 
difficulty of the problem. 

Like for error-correcting codes, message passing procedures provide a viable solution 
to this hard problem. The learn ing task can be handled appr oximately by algorithms 
derived from Belief Propagation ( Braunstein fc Zecchina , 20061 ) . Somewhat surprisingly, 
these techniques perform well for large random problems, even relatively close to the 
theoretical threshold M/N = a c . An on-line, biologically relevant variant of BP, which 
can still classify an extensive number of patterns, has also been showca sed as a plausible 
learning mechanism for realistic neural networks (jBaldassi et all 120071 ) . 
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Fig. 3. Left: a perceptron is a feed-forward network that takes a pattern £ as an input, and outputs a 
binary variable a. Right: training of the perceptron viewed as a constraint satisfaction problem (factor 
graph representation, see further). Weights are variables (circles), and each pattern to be classified defines 
a constraint (squares). 

4.2. Inferring neuronal couplings from multielectrode recordings 



Recent ex perimental studies i ndicat e that correlations play an important role in the 
retinal code (Schncid man et al. , 2006). In these experiments, many cells from a retinal 
ganglion patch are recorded simultaneously by a dense electrode array. It was shown 
that individual cells do not carry independent pieces of information, but rather respond 
cooperatively through effective pairwise interactions. This suggests that the stimulus is 
represented in a redundant manner reminiscent of error-correcting codes. We will see 
that the problem of learning effective pairwise interactions between neurons from the 
observed data can also be formulated in our common statistical physics language. 

Formally, the neural response of a retinal patch can be binned and represented by a 
string of binary variables. For each time bin of size St (with e.g. 5t = 20 ms), labelled 
by t, the neural response is coded by a binary word x , where x\ = +1 if neuron i has 
fired in that time bin, and x\ — — 1 otherwise. The neuronal response is stochastic in 
nature and can be described by a probability distribution P(x), which accounts for both 
stimulus and noise fluctuations. Beside its interest for itself, a correct estimation of P(x) 
is also important for the brain, as it may be used downstream the retina to evaluate the 
likelihood of spiking events, which in turn can be used to detect 'abnormal' stimuli, or 
to perform classification tasks. 

In the limit of a large integration time T, the probability distribution can in principle 
be measured through direct sampling: 

1 T 

(4) 

t=i 

In practice however, neither wc nor the brain itself can handle such a large amount of 
data. If N k, 200 is the number of cells in a patch, the number of pattern probabilities 
to be stored is 2 N w 10 60 , much more than any realistic integration time or storage 
capacity. One must thus recourse to simplifying assumptions. The simplest one is the 
independent approximation, which formally corresponds to factorizing the probability: 
P(x) = Y[iLi(^~\~ x i m i) One then just needs to measure the average mi :— (xi) of each 
neuron activity in order to reconstruct the full probability distribution (brackets denote 
expectations with respect to P{x)). Unfortunately, this approximation fails to correctly 
render some important statistical properties of the collective response, including the law 



G 



governing the total number of spikes in the population. This prompts us to take into 
account the correlative structure of the response. 

The first step beyond independence is to consider pairwise correlation functions: 

Xij = {x i x j )-(x i )(x j ), (5) 

These numbers measure the propensity of pairs of neurons to spike cooperatively rather 
than independently. An approximate probability distribution, that reproduces these cor- 
relations as well as the average firing probabilities (1 + mj )/2 with minimal constraints, 
can b e constructed using the principle of maximum entropy ( Javnesl 19491 : Schneidman et al 
20031 ). We look for a distribution p( 2 )(x) of maximum entropy 

5:=-^P (2) (a;)logP (2) (x) (6) 



that matches the one and two-point correlation functions of the observed response: 

(2) (2) fr7 \ 

Xlj-Xij, m\> =m 1 . (7) 
This distribution, which is uniquely defined, has been shown to account for most (90%) of 
the correlative structure of as many as 40 neurons recorded silmutaneously in the retina 
((Schneidman et all [2006h . 



With the help of Lagrange multipliers one can show that the Maximum Entropy dis- 
tribution takes the form: 



P (2) fe) = lcxp(5> 



i>3 



(8) 



where Z is a normalization constant. In physics terms this is a disordered Ising model. 
Usually, physicists face the problem of solving direct Ising problems, which typically con- 
sist in infering thcrmodynamical quantities, as well as magnetizations m, and correlation 
functions Xiji from the external fields hi and couplings Jij. This problem is computa- 
tionally very hard in general, and there exist no simple relation between (hi, Jjy) on the 
one hand, and (rrii,Xij) on the other: an exact estimate requires summing over the 2 N 
possible configurations x. Here we have to deal with the inverse Ising problem (inferring 
the couplings from the correlation functions), which is even harder. 

This learning problem and its variants have become increasingly important recently. 
Besides its relevance to neura l decoding, it is also useful for thinking about inference in 
protein i nteraction networks (Tkacik . 20071). the correlative structure of some catalytic 



proteins ( Socolich et all 12005 : Russ et al. . 20051) , and even the statistical properties of 



four-letters words in English ( Stephens &: BialekL 12007? ) 

A number of algorithmic strategies, mostly based on Monte-C arlo sampling 



been proposed to learn the couplings from the correlation functions (jAcklev et al 



have 



1987 



Broderick et~all 120071) . Very little is known, however, about possible neural implemen 



tations of this learning task. We will see that strategies based on message-passing ideas 
may provide leads on that question. 

5. The message passing strategy 



All the problems we have seen so far can be formulated in a common language. We 
have N variables (xx, ■ ■ ■ , xjv), taking value in some space X, and they are linked by 
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Fig. 4. Factor graph representation of satisfiability: A variable is represented by a circle. A constraint is 
represented by a square, connected with a full (resp. dashed) line to a variable when this variable appears 
as such (resp. negated) in the clause. Left hand side: The clause x\ V X2 VX3. Right hand side: the factor 
graph representing the formula: (xi V 22 V X4)A(xi V X2)A(x2 V14V xb)A(xi V12 V xs)A(xi V S3 V £5) 

constraints of probabilistic nature: each constraint ip a links the variables with labels 
i\ (a) , • • ■ , %k (1) ! hi the form of a probabilistic factor ip a (x il r a \ , • ■ ■ , x iK i a \ ) . In the case 
of hard constraints like parity checks the hard constraint takes value 1 if the check is 
satisfied, otherwise. In other cases it can take intermediate values, like for instance the 



factors 



,1-Vi 



(1 -p)8 x '. m +pS : 
defined by a probability distribution 



due to the received message in coding. The problem is 



H K {a.)) 



(9) 



Our goal is twofold. On the one hand we want to study the properties of one given 
instance: compute the marginal distributions P{xi), or find the x which maximizes P(x). 
On the other hand when P is generated from an ensemble which allows to consider the 
large N limit one would like to understand the phase diagram of the problem, like the 
thresholds pd and p c that we defined in decoding. 

Eq. (9) is not the most general probability distribution between N variables: the crucial 
point is that each ip a involves only a finite number of variables. When N is very large, 
P induces a topological structure in the space of variables that we shall exploit. The 
factor graph representation is a very convenient way of characterizing this structure. 
Each constraint ip a is represented by a function n ode (square), connected to the various 
variables (circles) which appear in the constraint (|Kschischang et all 120011 ) . An example 
for satisfiability is described in Fig. 4. The Tanner graph of code is nearly a factor graph: 
one just needs to add to it degree 1 function nodes connected to each variables, accounting 



for the factor 



(1 -P)6 x 'y i +P5 X > 1 - 1 



The factor graph of the perceptron learning 
problem is shown on Fig. 3. 

If the factor graph were a tree, it would be easy to solve our problem (for instance find 
marginals). The idea of BP is to write 'mean field' like equations that would be exact on a 
tree, and try to use them also in more general (and more interesting) cases. BP equations 
are self-consistency relations between two types of 'messages', rji—» a and rj a ^i- On trees, 
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?7i— ►<! can be interpreted as the probability measure on X4 when the factor node a has 
been removed, while rj a ^i is the probability measure on X4 when all factors neighboring 
i, expect a, have been removed. Denoting da = ( a), ■ ■ ■ , iR-(a)| the neighborh ood of 
i, and di the neighborhood of a, BP equations read ( Mezard fe MontanariLl2008 ): 



Va^i(Xi) = — — i>a( x ii(a), ■ ■ ■ ,^i K (a)) JJ_ %^( a 

Z a —n 



j£da\i 



(10) 

(11) 



b£di\a 



where the z's are normalization constants. In practice, these equations are solved by 
iteration (with parallel or random update schedules) until a fixed point is reached. Con- 
vergence is typically met in linear time. This makes BP a very fast algorithm. At the 
fixed point, the probability measure on Xi is given by: 



Pi \Xi 



(12) 



Thermodynamical quantities such as the free-energy — log Z can also be derived ijMezard fc Montanari , 
2008) from the messages (rji-> a , T] a ^i). 

Note that while convergence and accuracy are garanteed when the graph is a tree, BP 
equations sometimes fail to find the correct fixed point or provide a poor approximation 
of the probability measure when the graph is loopy. This can happen when there are 
many small loops, or when correlations build up across the graph. To ove rcome the first 



issue , generalized Belief Propagations (GBP) schemes have been proposed (jYedidia et al 
l200lh . The second issue, which is related to the partition of the measure P into a mul- 
tiplicity of discon nected 'states', can be handled by an extension of BP called Survey 



2005). 



Propagation (SP) (M ezard fc Zecchinal . 120021 : iBraunstein et al. 

As we mentionned earlier, BP is the best known solver for LDPC codes, provided 
that the channel noise is not too high. While BP can also handle random satisfiability 
problems for small enough clause densities a, SP becomes necessary as one gets to higher 
a, where problems become hard. SP can find solutions to 3-SAT in stances for up to 10 7 
variables at a = 4.25, very close to the satisfiability threshold a c (jMezard fc Zecchinal . 



2002) 



Beside their efficiency, the appeal of message passing procedures like BP resides in 
their local nature: information is propagated along the edges of the graph, and each 
message is updated using only other messages coming into the same node. This makes 
them highly amenable to parallclization. It is also tempting to make the connection with 
learning mechanisms in the brain, whereby synaptic strengths change only according to 
the activity of its neighboring neurons. And indeed, the engineering of BP/SP-inspired 
algorithms for the perceptron show that learning rules using only post a nd pre-synaptic 
activ ities, as well as error signals, suffice to implement efficient learning (jBaldassi et al 



2007) 
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Fig. 5. Factor graph representation of the Ising model Eq. (8). 

6. An application: the inverse Ising problem 



We now study a novel application of message-passing to the inverse Ising problem 
introduced in Section 4.2. As in the perceptron, the proposed method relies on local 
exchanges of information between variables. 

Let us start with the direct problem, whose factor graph is represented in Fig. 5. 
BP can be used to compute probability measures on single variables (i.e. local magne- 
tizations mi), but it docs not give information on the two-point correlation functions 
Xij- To access this information wc will need to go a bit further. We shall make use of 
the fluctuation-dissipation relation, which offers a convenient way to estimate pairwise 
correlation functions using the derivatives of magnetizations: 

_ _ drrij _ drrij 

Xij - xji - dhj - dhi • l^J 

But first we need to adapt the language of BP to the Ising model. The binary nature 
of Ising variables allows us to reduce BP messages to single numbers: 

m^a{Xi) = , Va^i(Xi) = ^ ^ ) 

These messages rrii_^ a and m a ^i are called 'cavity' magnetizations, as they are defined 
on amputed graphs. Note that when factor a is just a field contribution e hiXi , the message 
is trivial. When factor a is a interaction contribution e Ji ' XiX \ we rewrite for convenience 
rrii^j := mi^ a . 

The iteration of BP equations, along with Eq. (12), allows to compute the mi's. We 
now define a new type of messages, called cavity susceptibilities, and defined as: 

Xi^ k - (15) 

These messages are tied by a new set of self-consistency equations, called 'susceptibility 
propagation' equations, simply obtained as the derivatives of BP equations (10), (11) 
with respect to {h^}- They reflect how small local perturbations can propagate through 
the graph to remote variables, even when these variables and the perturbation are not 
directly linked. As in BP, these equations can be solved iteratively. When convergence is 
reached, the total susceptibilities \ij are given by derivatives of Eq. (12) with respect to 
{/'/!• 
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Fig. 6. Mean error a 2 = yj{(J' i: j — Jij) 2 ) of the susceptibility propagat ion reconstruction alg orithm 
presented in the text, compared against that of two mean- field schemes (Kappen & Rodrigucg, Il998l) 
(Weiss: naive mean-field, TAP: mean-field with back-reaction term). 

This susceptibility propagation algorithm has the same advantages and downsides as 
BP. While being relatively fast, it relies on the assumption that the behaviour of the 
model is not far from that of a tree factor graph. This can be true if the graph is sparse 
and locally tree-like, or if the interactions are small enough. 

Susceptibility propagation (approximately) solves the direct Ising problem (hi, Jij) — > 
(rrii , Xij ) ■ How can we use it to solve the inverse problem? The key is to realize that 
although susceptibility equations are self-consistency equations on the messages, they can 
also be viewed as self-consistency equations on the 'inputs' (hi, Jij) by simply extracting 
them from the belief and susceptibility propagation equations. For instance, on can write 
the following update rule for Jy , 

Xij fi^i — >j"lj — >i 



tanh Jij = 



(16) 

1 - Xij m i^j m i~>i 

The rest of the iteration equations remains essentially unchanged, with the notable dif- 
ference that now (rrii,Xij) are treated as constants, while (hi, Jij) become the unknown 
variables to be updated. 

We have tested our algorithm on synthetic data. First we have considered a spin glass 
with random gaussian couplings Jij of zero mean and variance J 2 /N, with no magnetic 
fields, hi — 0. This is the Shcrrington-Kirkpatrick model. Small problems (N = 10, 15, 20) 
are drawn at random and solved exactly by exhaustive enumeration. Then our algorithm 
tries to reconstruct the couplings Jy from the correlation functi ons. Its performance is 
show n on Fig. 6, and is contrasted with other mean-field methods (jKappen fc Rodriguezl . 
19981 ). Interestingly, all mean- field schemes fail for J > 1, where the system notoriously 
becomes 'glassy', with the onset of metastable states. 

Perhaps the power of susceptibility propagation is better shown on examples where it 
is supposed to be exact, namely, when the underlying topology is a tree. For simplicity 
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Fig. 7. Reconstruction of a small linear chain. Knowing only the correlation functions, and with no prior 
knowledge on the topology of the graph, the algorithm can infer both the structure and the numerical 
values of the interaction strengths. Here is shown the progress of the algorithm. The gray level of each edge 
codes for the couplings strength Jij. The algorithm is started with random initial conditions (leftmost 
graph). Next are shown, from left to right, the couplings after 3, 6, 9 and 20 iterations. 

we have tested our algorithm on linear chains. Provided that the couplings are not too 
large, we can reconstruct both the topology of the linear chain (i.e. the order of variables 
on the chain), and the exact strength of interactions between neighbours (see Fig. 7). 
When the couplings are too large, the exact solution becomes unstable. This can partially 
be remedied, however, by making zero couplings more attractive in the equations, thus 
stabilizing sparse topologies. 

A more systematic method for treating sparse networks is however needed. With it, 
susceptibility propagation could be used as a comprehensive network reconstruction al- 
gorithm, with possible applications to the inference of Bayesian networks, Markov chains 
with arbitrary topologies, or in population genetics. 

7. Conclusion 

The message passing strategy often provides the most efficient algorithms for solving 
hard constraint satisfaction problems, or for inference in graphical models. This is espe- 
cially true when the factor graph representing the problem has a local tree- like structure. 
It is particularly remarkable that some very difficult problems, which cannot be solved 
by other methods, are solved by procedures of local exchange of messages between the 
variables and constraints. It is likely that recent developments in this domain can have 
some impact in neuroscience, in at least two directions. First of all because some major 
challenges in neuroscience, linked to the analysis of experimental data, can themselves be 
formulated in terms of graphical or constraint satisfaction problems. Secondly because 
the mere fact that distributed local information exchange systems achieve this task is 
very appealing in the perspective of information processing by the brain. 
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